Analysis of Global Temperature and Green House Emissions

To Yin Yu

Introduction

Climate change has always been a very important problem to human beings, and as many believes it is driven by greenhouse gases emissions and extensive shifts happening in weather pattern around the world. The aim of this tutorial is to try to understand the relationship between the greenhouse gases emissions and temperature, also analyzing the possible factors that might effect greenhouse gases emissions. The green house gases emissions data we will be using are collected from Our World in Data and data regarding temperature will be collected from Berkeley Earth. Our world in Data is a well-known organization that aim to research on data that can help tackle world's largest problems. For more infomation you can visit https://ourworldindata.org/co2-and-other-greenhouse-gas-emissions. On the other hand, Berkeley Earth aim to supply comprehensive and highly user-accessible data that might help explain climate change problem. For more infomation about them, visit http://berkeleyearth.org/. We will be using data from 2012 to 1990 to perform our analysis.

For this tutorial, pandas, numpy and pycountry library will be used to read and organize the data collected. matplotlib, plotly.express for visualization. For analyzing, SciKit-Learn will be used. This tutorial assumes prior knowledge.

Collect Data

Collect Data is first stage in the data lifecycle. We mainly aimed to gather the data in this stage. To download the green house gases emissions dataset for youself, visit https://github.com/owid/co2-data. For global temperature dataset, you can obtain they here https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data, we will be using the the dataset "GlobalLandTemperaturesByCountry" for this tutorial. All of the above dataset are in CSV format.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
import plotly.express as px
import pycountry
In [2]:
#green house gases emissions dataset
greenhouse_emissions = pd.read_csv("Green_house_emission_data.csv")

greenhouse_emissions.head(10)
Out[2]:
iso_code country year co2 co2_growth_prct co2_growth_abs consumption_co2 trade_co2 trade_co2_share co2_per_capita ... ghg_per_capita methane methane_per_capita nitrous_oxide nitrous_oxide_per_capita primary_energy_consumption energy_per_capita energy_per_gdp population gdp
0 AFG Afghanistan 1949 0.015 NaN NaN NaN NaN NaN 0.002 ... NaN NaN NaN NaN NaN NaN NaN NaN 7663783.0 NaN
1 AFG Afghanistan 1950 0.084 475.000 0.070 NaN NaN NaN 0.011 ... NaN NaN NaN NaN NaN NaN NaN NaN 7752000.0 1.949480e+10
2 AFG Afghanistan 1951 0.092 8.696 0.007 NaN NaN NaN 0.012 ... NaN NaN NaN NaN NaN NaN NaN NaN 7840000.0 2.006385e+10
3 AFG Afghanistan 1952 0.092 0.000 0.000 NaN NaN NaN 0.012 ... NaN NaN NaN NaN NaN NaN NaN NaN 7936000.0 2.074235e+10
4 AFG Afghanistan 1953 0.106 16.000 0.015 NaN NaN NaN 0.013 ... NaN NaN NaN NaN NaN NaN NaN NaN 8040000.0 2.201546e+10
5 AFG Afghanistan 1954 0.106 0.000 0.000 NaN NaN NaN 0.013 ... NaN NaN NaN NaN NaN NaN NaN NaN 8151000.0 2.248333e+10
6 AFG Afghanistan 1955 0.154 44.828 0.048 NaN NaN NaN 0.019 ... NaN NaN NaN NaN NaN NaN NaN NaN 8271000.0 2.292989e+10
7 AFG Afghanistan 1956 0.183 19.048 0.029 NaN NaN NaN 0.022 ... NaN NaN NaN NaN NaN NaN NaN NaN 8399000.0 2.395993e+10
8 AFG Afghanistan 1957 0.293 60.000 0.110 NaN NaN NaN 0.034 ... NaN NaN NaN NaN NaN NaN NaN NaN 8535000.0 2.396191e+10
9 AFG Afghanistan 1958 0.330 12.500 0.037 NaN NaN NaN 0.038 ... NaN NaN NaN NaN NaN NaN NaN NaN 8680000.0 2.530744e+10

10 rows × 38 columns

In [3]:
#global average temperature dataset
global_land_temps = pd.read_csv("GlobalLandTemperaturesByCountry.csv")

global_land_temps.head(10)
Out[3]:
dt AverageTemperature AverageTemperatureUncertainty City Country Latitude Longitude
0 1743-11-01 6.068 1.737 Ã…rhus Denmark 57.05N 10.33E
1 1743-12-01 NaN NaN Ã…rhus Denmark 57.05N 10.33E
2 1744-01-01 NaN NaN Ã…rhus Denmark 57.05N 10.33E
3 1744-02-01 NaN NaN Ã…rhus Denmark 57.05N 10.33E
4 1744-03-01 NaN NaN Ã…rhus Denmark 57.05N 10.33E
5 1744-04-01 5.788 3.624 Ã…rhus Denmark 57.05N 10.33E
6 1744-05-01 10.644 1.283 Ã…rhus Denmark 57.05N 10.33E
7 1744-06-01 14.051 1.347 Ã…rhus Denmark 57.05N 10.33E
8 1744-07-01 16.082 1.396 Ã…rhus Denmark 57.05N 10.33E
9 1744-08-01 NaN NaN Ã…rhus Denmark 57.05N 10.33E

Data Processing

We have loaded in all the data we need to perform the rest of the tutorial now. However, the data is very unorganized and incomprehensive. Hence, we move on to our next stage of the cycle, which is data processing. During this stage, we are aimed to reorganize the data, try to reform it to as comprehensive as possible. This will help prepare for the stage. We will dropping all the columns we are not anticipate to use, possibly renaming the columns or even adding new columns, in order to make the dataset as tidy and organized as possible.

In [4]:
#clearning emissions data
greenhouse_emissions.drop(greenhouse_emissions[(greenhouse_emissions['year'] > 2012)].index, inplace = True)
greenhouse_emissions.drop(greenhouse_emissions[(greenhouse_emissions['year'] < 1990)].index, inplace = True)
greenhouse_emissions = greenhouse_emissions.filter(['iso_code', 'country', 'year', 'co2', 'methane', 'nitrous_oxide'])

greenhouse_emissions.head(20)
Out[4]:
iso_code country year co2 methane nitrous_oxide
41 AFG Afghanistan 1990 2.602 8.97 3.25
42 AFG Afghanistan 1991 2.426 9.07 3.30
43 AFG Afghanistan 1992 1.382 9.00 3.21
44 AFG Afghanistan 1993 1.334 8.90 3.21
45 AFG Afghanistan 1994 1.282 8.97 2.99
46 AFG Afghanistan 1995 1.231 9.15 3.07
47 AFG Afghanistan 1996 1.165 9.93 3.29
48 AFG Afghanistan 1997 1.084 10.60 3.59
49 AFG Afghanistan 1998 1.029 11.10 3.88
50 AFG Afghanistan 1999 0.810 11.87 4.15
51 AFG Afghanistan 2000 0.768 10.59 3.62
52 AFG Afghanistan 2001 0.812 9.36 3.22
53 AFG Afghanistan 2002 1.064 11.21 3.72
54 AFG Afghanistan 2003 1.205 11.56 3.92
55 AFG Afghanistan 2004 0.908 11.47 3.82
56 AFG Afghanistan 2005 1.320 11.68 3.97
57 AFG Afghanistan 2006 1.643 14.89 4.06
58 AFG Afghanistan 2007 2.268 18.10 4.25
59 AFG Afghanistan 2008 4.198 22.19 4.81
60 AFG Afghanistan 2009 6.760 25.43 5.32
In [5]:
#cleaning temperature data

#reorganizing temperature data 
global_land_temps = global_land_temps.rename(columns={'dt': 'Date'})
global_land_temps['Date'] = pd.to_datetime(global_land_temps['Date'] , format='%Y-%m-%d')

global_land_temps.head(15)
Out[5]:
Date AverageTemperature AverageTemperatureUncertainty City Country Latitude Longitude
0 1743-11-01 6.068 1.737 Ã…rhus Denmark 57.05N 10.33E
1 1743-12-01 NaN NaN Ã…rhus Denmark 57.05N 10.33E
2 1744-01-01 NaN NaN Ã…rhus Denmark 57.05N 10.33E
3 1744-02-01 NaN NaN Ã…rhus Denmark 57.05N 10.33E
4 1744-03-01 NaN NaN Ã…rhus Denmark 57.05N 10.33E
5 1744-04-01 5.788 3.624 Ã…rhus Denmark 57.05N 10.33E
6 1744-05-01 10.644 1.283 Ã…rhus Denmark 57.05N 10.33E
7 1744-06-01 14.051 1.347 Ã…rhus Denmark 57.05N 10.33E
8 1744-07-01 16.082 1.396 Ã…rhus Denmark 57.05N 10.33E
9 1744-08-01 NaN NaN Ã…rhus Denmark 57.05N 10.33E
10 1744-09-01 12.781 1.454 Ã…rhus Denmark 57.05N 10.33E
11 1744-10-01 7.950 1.630 Ã…rhus Denmark 57.05N 10.33E
12 1744-11-01 4.639 1.302 Ã…rhus Denmark 57.05N 10.33E
13 1744-12-01 0.122 1.756 Ã…rhus Denmark 57.05N 10.33E
14 1745-01-01 -1.333 1.642 Ã…rhus Denmark 57.05N 10.33E
In [6]:
#removing unrelated rows(only keeping 1990-2012 data) for temperature data
global_land_temps['Year'] = pd.DatetimeIndex(global_land_temps['Date']).year
global_land_temps.drop(global_land_temps[(global_land_temps['Year'] > 2012)].index, inplace = True)
global_land_temps.drop(global_land_temps[(global_land_temps['Year'] < 1990)].index, inplace = True)

#transform monthlyata into annually data by taking the mean
annual_avg_temp = global_land_temps.groupby(['Year','Country']).mean().reset_index()
exclude = ['Asia', 'Bonaire, Saint Eustatius And Saba']
annual_avg_temp = annual_avg_temp[~annual_avg_temp['Country'].isin(exclude)]

annual_avg_temp.head(15)
Out[6]:
Year Country AverageTemperature AverageTemperatureUncertainty
0 1990 Afghanistan 14.913135 0.494771
1 1990 Albania 16.371083 0.335667
2 1990 Algeria 18.818533 0.470667
3 1990 Angola 22.333861 0.649042
4 1990 Argentina 17.410458 0.375115
5 1990 Armenia 9.097833 0.360750
6 1990 Australia 17.351583 0.268833
7 1990 Austria 7.460117 0.374533
8 1990 Azerbaijan 12.014083 0.436500
9 1990 Bahamas 25.742917 0.263917
10 1990 Bahrain 26.260833 0.706583
11 1990 Bangladesh 25.523000 0.399605
12 1990 Belarus 8.068208 0.268063
13 1990 Belgium 11.185179 0.259667
14 1990 Benin 27.563139 0.442556

As you can see here, our temperature data does not came with an ISO-3 country code, that is needed for merging with emissions data. Hence, we will be using pycountry library here to add a column of ISO-3 country.

In [7]:
#adding country code column
annual_avg_temp['CountryCode'] = annual_avg_temp['Country'].apply(lambda x: pycountry.countries.search_fuzzy(x)[0].alpha_3)

annual_avg_temp.head(10)
Out[7]:
Year Country AverageTemperature AverageTemperatureUncertainty CountryCode
0 1990 Afghanistan 14.913135 0.494771 AFG
1 1990 Albania 16.371083 0.335667 ALB
2 1990 Algeria 18.818533 0.470667 DZA
3 1990 Angola 22.333861 0.649042 AGO
4 1990 Argentina 17.410458 0.375115 ARG
5 1990 Armenia 9.097833 0.360750 ARM
6 1990 Australia 17.351583 0.268833 AUS
7 1990 Austria 7.460117 0.374533 AUT
8 1990 Azerbaijan 12.014083 0.436500 AZE
9 1990 Bahamas 25.742917 0.263917 BHS
In [8]:
#merging temperature data with greenhouse emissions data
Combined_df = annual_avg_temp.merge(greenhouse_emissions, left_on= ['CountryCode', 'Year'], right_on = ['iso_code', 'year'])

#dropping the unnecessary/repeated column 
Combined_df = Combined_df.drop(['country', 'year', 'iso_code','AverageTemperatureUncertainty'], axis=1)
Combined_df = Combined_df.replace('NaN', np.nan)
Combined_df = Combined_df.dropna()

#reorganize the column of the combined data
Combined_df = Combined_df[['Year', 'CountryCode', 'Country', 'AverageTemperature', 'co2', 'methane', 'nitrous_oxide']]

Combined_df.head(10)
Out[8]:
Year CountryCode Country AverageTemperature co2 methane nitrous_oxide
0 1990 AFG Afghanistan 14.913135 2.602 8.97 3.25
1 1990 ALB Albania 16.371083 5.511 3.67 1.59
2 1990 DZA Algeria 18.818533 76.737 22.14 5.00
3 1990 AGO Angola 22.333861 5.089 39.07 18.43
4 1990 ARG Argentina 17.410458 111.890 114.00 37.97
5 1990 ARM Armenia 9.097833 18.116 3.59 0.87
6 1990 AUS Australia 17.351583 278.424 162.19 82.27
7 1990 AUT Austria 7.460117 62.323 11.57 4.90
8 1990 AZE Azerbaijan 12.014083 51.691 16.39 2.84
9 1990 BHS Bahamas 25.742917 1.839 0.24 0.07

Data Visualization

Now our dataset looks much organized and comprehensive, we can move on to the next stage which is data visualization. In this stage, we will try to turn all the nurmeric values into more explanatory graphs. We will also look for potential trends though the graph we general, this will give us a better understand of our data. The type of graph I am using is choropleth world map, with countries that has a higher green house gases emissions painted with darker colors, those with lower emissions with lighter colors.

We will be using the plotly.express library for the choropleth world maps.

In [9]:
# Amount of carbon dioxide(CO2) emissions of each country over time
fig = px.choropleth(data_frame = Combined_df,
                    locations= "CountryCode",
                    color= "co2",  
                    hover_name= "Country",
                    color_continuous_scale= 'YlOrRd',  
                    animation_frame= "Year")

fig.show()
In [10]:
# Amount of methane emissions of each country over time
fig = px.choropleth(data_frame = Combined_df,
                    locations= "CountryCode",
                    color= "methane",  
                    hover_name= "Country",
                    color_continuous_scale= 'YlOrRd',  
                    animation_frame= "Year")

fig.show()
In [11]:
# Amount of nitrous oxide emissions of each country over time
fig = px.choropleth(data_frame = Combined_df,
                    locations= "CountryCode",
                    color= "nitrous_oxide",  
                    hover_name= "Country",
                    color_continuous_scale= 'YlOrRd',  
                    animation_frame= "Year")

fig.show()

After looking at the choropleth maps, we found that there are some countries that has a more notable changes in greenhouse gases emissions over time. We decided to create lineplots and scatterplots to obtain clearer imagine of the potential trends between average temperature and emissions.

The countries including China, United States, Russia, Germany, United Kingdom, Japan, India, Indonesia, Brazil. Lineplots and scatter plots will be obtained with matplotlib library.

In [12]:
# line plot of average annual temperature vs time for China
china = Combined_df[Combined_df['Country'] == 'China']
china.plot(x='Year', y='AverageTemperature')
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fca99f7550>
In [13]:
# line plot of annual green house gases emissions vs time for China
china.plot(x='Year', y=['co2','methane','nitrous_oxide'])
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fca9857700>
In [14]:
# line plot of average annual temperature vs time for United States
us = Combined_df[Combined_df['Country'] == 'United States']
us.plot(x='Year', y='AverageTemperature')
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fca9cc3fa0>
In [15]:
# line plot of annual green house gases emissions vs time for United States
us.plot(x='Year', y=['co2','methane','nitrous_oxide'])
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fca9d24bb0>
In [16]:
# line plot of average annual temperature vs time for Russia
russia = Combined_df[Combined_df['Country'] == 'Russia']
russia.plot(x='Year', y='AverageTemperature')
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaa94bf40>
In [17]:
# line plot of annual green house gases emissions vs time for Russia
russia.plot(x='Year', y=['co2','methane','nitrous_oxide'])
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaa9a7910>
In [18]:
# line plot of average annual temperature vs time for Germany
germany = Combined_df[Combined_df['Country'] == 'Germany']
germany.plot(x='Year', y='AverageTemperature')
Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaaa0cf10>
In [19]:
# line plot of annual green house gases emissions vs time for Germany
germany.plot(x='Year', y=['co2','methane','nitrous_oxide'])
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaaa5f4f0>
In [20]:
# line plot of average annual temperature vs time for United Kingdom
uk = Combined_df[Combined_df['Country'] == 'United Kingdom']
uk.plot(x='Year', y='AverageTemperature')
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaaac45b0>
In [21]:
# line plot of annual green house gases emissions vs time for United Kingdom
uk.plot(x='Year', y=['co2','methane','nitrous_oxide'])
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaab281c0>
In [22]:
# line plot of average annual temperature vs time for Japan
japan = Combined_df[Combined_df['Country'] == 'Japan']
japan.plot(x='Year', y='AverageTemperature')
Out[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaab8c1f0>
In [23]:
# line plot of annual green house gases emissions vs time for Japan
japan.plot(x='Year', y=['co2','methane','nitrous_oxide'])
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaabf1c40>
In [24]:
# line plot of average annual temperature vs time for India
india = Combined_df[Combined_df['Country'] == 'India']
india.plot(x='Year', y='AverageTemperature')
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaaa01130>
In [25]:
# line plot of annual green house gases emissions vs time for India
india.plot(x='Year', y=['co2','methane','nitrous_oxide'])
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaabdfa90>
In [26]:
# line plot of average annual temperature vs time for Indonesia
indonesia = Combined_df[Combined_df['Country'] == 'Indonesia']
indonesia.plot(x='Year', y='AverageTemperature')
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaf6711c0>
In [27]:
# line plot of annual green house gases emissions vs time for Indonesia
indonesia.plot(x='Year', y=['co2','methane','nitrous_oxide'])
Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaf6e0400>
In [28]:
# line plot of average annual temperature vs time for Brazil
brazil = Combined_df[Combined_df['Country'] == 'Brazil']
brazil.plot(x='Year', y='AverageTemperature')
Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaf744bb0>
In [29]:
# line plot of annual green house gases emissions vs time for Brazil
brazil.plot(x='Year', y=['co2','methane','nitrous_oxide'])
Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaf797790>

After ploting all the line plot of annual emissions and temperature vs times for all the target countries, we would out some interesting trends. No all the countries show notable relationship between emissions and temperatures; However, we are able to observe some plausible trend. For Russia, when its emission drop in the earlier years and rise during the later years, the temperature follows a similar pattern. For Germany, there are no much notable change for both temperature and emissions. In United Kingdom case, both temperature and emissions dropped slightly; And for Japan, both rise slightly. For India, both temperature and emissions increases quite noticeably, same for Brazil. Indonesia shows an interesting case, where its methane emissions were the highest during the earlier years but were dropping over time, and CO2 increases dramatically over time, and overpassed methane emissions. Also, its temperature is rising. China and United States's plots do not show a very clear relationship, which we will analyze further in the next stage.

In [30]:
# scatter plot of average annual temperature vs. carbon dioxide emissions for China
china.plot(x='co2', y='AverageTemperature', kind = 'scatter')
Out[30]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaf7ec6d0>
In [31]:
# scatter plot of average annual temperature vs. carbon dioxide emissions for United States
us.plot(x='co2', y='AverageTemperature', kind = 'scatter')
Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaf86d670>
In [32]:
# scatter plot of average annual temperature vs. carbon dioxide emissions for Russia
russia.plot(x='co2', y='AverageTemperature', kind = 'scatter')
Out[32]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaf8a0310>
In [33]:
# scatter plot of average annual temperature vs. carbon dioxide emissions for Germany
germany.plot(x='co2', y='AverageTemperature', kind = 'scatter')
Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaf902c10>
In [34]:
# scatter plot of average annual temperature vs. carbon dioxide emissions for United Kingdom
uk.plot(x='co2', y='AverageTemperature', kind = 'scatter')
Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaf9620a0>
In [35]:
# scatter plot of average annual temperature vs. carbon dioxide emissions for Japan
japan.plot(x='co2', y='AverageTemperature', kind = 'scatter')
Out[35]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaf9bdc70>
In [36]:
# scatter plot of average annual temperature vs. carbon dioxide emissions for India
india.plot(x='co2', y='AverageTemperature', kind = 'scatter')
Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcafa1d160>
In [37]:
# scatter plot of average annual temperature vs. carbon dioxide emissions for Indonesia
indonesia.plot(x='co2', y='AverageTemperature', kind = 'scatter')
Out[37]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcafa638b0>
In [38]:
# scatter plot of average annual temperature vs. carbon dioxide emissions for Brazil
brazil.plot(x='co2', y='AverageTemperature', kind = 'scatter')
Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcafacb190>
In [39]:
# scatter plot of average annual temperature vs. methane emissions for China
china.plot(x='methane', y='AverageTemperature', kind = 'scatter')
Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcafb1bbb0>
In [40]:
# scatter plot of average annual temperature vs. methane emissions for United States
us.plot(x='methane', y='AverageTemperature', kind = 'scatter')
Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcafb6ce80>
In [41]:
# scatter plot of average annual temperature vs. methane emissions for Russia
russia.plot(x='methane', y='AverageTemperature', kind = 'scatter')
Out[41]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcafc23e20>
In [42]:
# scatter plot of average annual temperature vs. methane emissions for Germany
germany.plot(x='methane', y='AverageTemperature', kind = 'scatter')
Out[42]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcafc79f70>
In [43]:
# scatter plot of average annual temperature vs. methane emissions for United Kingdom
uk.plot(x='methane', y='AverageTemperature', kind = 'scatter')
Out[43]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcafcc5d90>
In [44]:
# scatter plot of average annual temperature vs. methane emissions for Japan
japan.plot(x='methane', y='AverageTemperature', kind = 'scatter')
Out[44]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaf73c670>
In [45]:
# scatter plot of average annual temperature vs. methane emissions for India
india.plot(x='methane', y='AverageTemperature', kind = 'scatter')
Out[45]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcafc5e970>
In [46]:
# scatter plot of average annual temperature vs. methane emissions for Indonesia
indonesia.plot(x='methane', y='AverageTemperature', kind = 'scatter')
Out[46]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcafda3970>
In [47]:
# scatter plot of average annual temperature vs. methane emissions for Brazil
brazil.plot(x='methane', y='AverageTemperature', kind = 'scatter')
Out[47]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcafde79d0>
In [48]:
# scatter plot of average annual temperature vs. nitrous oxide emissions for China
china.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
Out[48]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcafe4da90>
In [49]:
# scatter plot of average annual temperature vs. nitrous oxide emissions for United States
us.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
Out[49]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcafeae190>
In [50]:
# scatter plot of average annual temperature vs. nitrous oxide emissions for Russia
russia.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
Out[50]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaff0c3a0>
In [51]:
# scatter plot of average annual temperature vs. nitrous oxide emissions for Germany
germany.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
Out[51]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaff5bd60>
In [52]:
# scatter plot of average annual temperature vs. nitrous oxide emissions for United Kingdom
uk.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
Out[52]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcaffada90>
In [53]:
# scatter plot of average annual temperature vs. nitrous oxide emissions for Japam
japan.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
Out[53]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcb0012610>
In [54]:
# scatter plot of average annual temperature vs. nitrous oxide emissions for India
india.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
Out[54]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcb006ceb0>
In [55]:
# scatter plot of average annual temperature vs. nitrous oxide emissions for Indonesia
indonesia.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
Out[55]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcb00bb880>
In [56]:
# scatter plot of average annual temperature vs. nitrous oxide emissions for Brazil
brazil.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
Out[56]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fcb011af10>

These plot added some more details/patterns on the trend we saw in the lineplots. However, at this point, we are still not very certain about how strong the relationship is. This leads to the next stage in the cycle, where we will try to fit linear regreesion model.

Data Analysis

As I mentioned previously, we are going to utilize machine learning algorithm and statistics to determine the relationship, we will try to observe how strong and whether it is a positive or negative relationship between the emissions and temperature. We will be building a linear regression model, in order to obtain the information and results we want.

We are using scikit-learn library here to fit our linear regression model into our data.

In [57]:
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, r2_score
In [58]:
# linear regession for China(co2)
x_cn_co2 = china.co2.to_numpy().reshape(-1,1)
regr_cn_co2 = linear_model.LinearRegression()
regr_cn_co2.fit(x_cn_co2, china.AverageTemperature)

avgtemp_pred_cn_co2 = regr_cn_co2.predict(x_cn_co2)

china.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(china.co2, avgtemp_pred_cn_co2, color='blue', linewidth=3)

plt.show()
In [59]:
# The Intercept
print('Intercept: \n', regr_cn_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_cn_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(china.AverageTemperature, avgtemp_pred_cn_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(china.AverageTemperature, avgtemp_pred_cn_co2))
Intercept: 
 13.553411922728284
Coefficients: 
 [7.20967729e-06]
Mean squared error: 0.13
Coefficient of determination: 0.00
In [60]:
# linear regession for China(Methane)
x_cn_ch4 = china.methane.to_numpy().reshape(-1,1)
regr_cn_ch4 = linear_model.LinearRegression()
regr_cn_ch4.fit(x_cn_ch4, china.AverageTemperature)

avgtemp_pred_cn_ch4 = regr_cn_ch4.predict(x_cn_ch4)

china.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(china.methane, avgtemp_pred_cn_ch4, color='blue', linewidth=3)

plt.show()
In [61]:
# The Intercept
print('Intercept: \n', regr_cn_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_cn_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(china.AverageTemperature, avgtemp_pred_cn_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(china.AverageTemperature, avgtemp_pred_cn_ch4))
Intercept: 
 13.696378890732431
Coefficients: 
 [-0.00012424]
Mean squared error: 0.13
Coefficient of determination: 0.00
In [62]:
# linear regession for China(nitrous oxide)
x_cn_n2o = china.nitrous_oxide.to_numpy().reshape(-1,1)
regr_cn_n2o = linear_model.LinearRegression()
regr_cn_n2o.fit(x_cn_n2o, china.AverageTemperature)

avgtemp_pred_cn_n2o = regr_cn_n2o.predict(x_cn_n2o)

china.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(china.nitrous_oxide, avgtemp_pred_cn_n2o, color='blue', linewidth=3)

plt.show()
In [63]:
# The Intercept
print('Intercept: \n', regr_cn_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_cn_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(china.AverageTemperature, avgtemp_pred_cn_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(china.AverageTemperature, avgtemp_pred_cn_n2o))
Intercept: 
 13.228888205190115
Coefficients: 
 [0.00089151]
Mean squared error: 0.12
Coefficient of determination: 0.02
In [64]:
# linear regession for United States(co2)
x_us_co2 = us.co2.to_numpy().reshape(-1,1)
regr_us_co2 = linear_model.LinearRegression()
regr_us_co2.fit(x_us_co2, us.AverageTemperature)

avgtemp_pred_us_co2 = regr_us_co2.predict(x_us_co2)

us.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(us.co2, avgtemp_pred_us_co2, color='blue', linewidth=3)

plt.show()
In [65]:
# The Intercept
print('Intercept: \n', regr_us_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_us_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(us.AverageTemperature, avgtemp_pred_us_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(us.AverageTemperature, avgtemp_pred_us_co2))
Intercept: 
 13.8185978249816
Coefficients: 
 [0.00021197]
Mean squared error: 0.10
Coefficient of determination: 0.05
In [66]:
# linear regession for United States(methane)
x_us_ch4 = us.methane.to_numpy().reshape(-1,1)
regr_us_ch4 = linear_model.LinearRegression()
regr_us_ch4.fit(x_us_ch4, us.AverageTemperature)

avgtemp_pred_us_ch4 = regr_us_ch4.predict(x_us_ch4)

us.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(us.methane, avgtemp_pred_us_ch4, color='blue', linewidth=3)

plt.show()
In [67]:
# The Intercept
print('Intercept: \n', regr_us_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_us_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(us.AverageTemperature, avgtemp_pred_us_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(us.AverageTemperature, avgtemp_pred_us_ch4))
Intercept: 
 17.50949548410453
Coefficients: 
 [-0.00345141]
Mean squared error: 0.08
Coefficient of determination: 0.26
In [68]:
# linear regession for United States(nitrous oxide)
x_us_n2o = us.nitrous_oxide.to_numpy().reshape(-1,1)
regr_us_n2o = linear_model.LinearRegression()
regr_us_n2o.fit(x_us_n2o, us.AverageTemperature)

avgtemp_pred_us_n2o = regr_us_n2o.predict(x_us_n2o)

us.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(us.nitrous_oxide, avgtemp_pred_us_n2o, color='blue', linewidth=3)

plt.show()
In [69]:
# The Intercept
print('Intercept: \n', regr_us_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_us_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(us.AverageTemperature, avgtemp_pred_us_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(us.AverageTemperature, avgtemp_pred_us_n2o))
Intercept: 
 17.666943128772573
Coefficients: 
 [-0.01001773]
Mean squared error: 0.09
Coefficient of determination: 0.08
In [70]:
# linear regession for Russia(co2)
x_rs_co2 = russia.co2.to_numpy().reshape(-1,1)
regr_rs_co2 = linear_model.LinearRegression()
regr_rs_co2.fit(x_rs_co2, russia.AverageTemperature)

avgtemp_pred_rs_co2 = regr_rs_co2.predict(x_rs_co2)

russia.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(russia.co2, avgtemp_pred_rs_co2, color='blue', linewidth=3)

plt.show()
In [71]:
# The Intercept
print('Intercept: \n', regr_rs_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_rs_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(russia.AverageTemperature, avgtemp_pred_rs_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(russia.AverageTemperature, avgtemp_pred_rs_co2))
Intercept: 
 4.446847032275393
Coefficients: 
 [-2.18351431e-06]
Mean squared error: 0.31
Coefficient of determination: 0.00
In [72]:
# linear regession for Russia(methane)
x_rs_ch4 = russia.methane.to_numpy().reshape(-1,1)
regr_rs_ch4 = linear_model.LinearRegression()
regr_rs_ch4.fit(x_rs_ch4, russia.AverageTemperature)

avgtemp_pred_rs_ch4 = regr_rs_ch4.predict(x_rs_ch4)

russia.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(russia.methane, avgtemp_pred_rs_ch4, color='blue', linewidth=3)

plt.show()
In [73]:
# The Intercept
print('Intercept: \n', regr_rs_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_rs_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(russia.AverageTemperature, avgtemp_pred_rs_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(russia.AverageTemperature, avgtemp_pred_rs_ch4))
Intercept: 
 3.1598352471911335
Coefficients: 
 [0.00166606]
Mean squared error: 0.29
Coefficient of determination: 0.09
In [74]:
# linear regession for Russia(nitrous oxide)
x_rs_n2o = russia.nitrous_oxide.to_numpy().reshape(-1,1)
regr_rs_n2o = linear_model.LinearRegression()
regr_rs_n2o.fit(x_rs_n2o, russia.AverageTemperature)

avgtemp_pred_rs_n2o = regr_rs_n2o.predict(x_rs_n2o)

russia.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(russia.nitrous_oxide, avgtemp_pred_rs_n2o, color='blue', linewidth=3)

plt.show()
In [75]:
# The Intercept
print('Intercept: \n', regr_rs_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_rs_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(russia.AverageTemperature, avgtemp_pred_rs_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(russia.AverageTemperature, avgtemp_pred_rs_n2o))
Intercept: 
 4.74884545654405
Coefficients: 
 [-0.00415526]
Mean squared error: 0.31
Coefficient of determination: 0.02
In [76]:
# linear regession for Germany(co2)
x_gr_co2 = germany.co2.to_numpy().reshape(-1,1)
regr_gr_co2 = linear_model.LinearRegression()
regr_gr_co2.fit(x_gr_co2, germany.AverageTemperature)

avgtemp_pred_gr_co2 = regr_gr_co2.predict(x_gr_co2)

germany.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(germany.co2, avgtemp_pred_gr_co2, color='blue', linewidth=3)

plt.show()
In [77]:
# The Intercept
print('Intercept: \n', regr_gr_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_gr_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(germany.AverageTemperature, avgtemp_pred_gr_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(germany.AverageTemperature, avgtemp_pred_gr_co2))
Intercept: 
 11.382701348475955
Coefficients: 
 [-0.00200467]
Mean squared error: 0.36
Coefficient of determination: 0.04
In [78]:
# linear regession for Germany(methane)
x_gr_ch4 = germany.methane.to_numpy().reshape(-1,1)
regr_gr_ch4 = linear_model.LinearRegression()
regr_gr_ch4.fit(x_gr_ch4, germany.AverageTemperature)

avgtemp_pred_gr_ch4 = regr_gr_ch4.predict(x_gr_ch4)

germany.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(germany.methane, avgtemp_pred_gr_ch4, color='blue', linewidth=3)

plt.show()
In [79]:
# The Intercept
print('Intercept: \n', regr_gr_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_gr_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(germany.AverageTemperature, avgtemp_pred_gr_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(germany.AverageTemperature, avgtemp_pred_gr_ch4))
Intercept: 
 10.027880274472775
Coefficients: 
 [-0.00509509]
Mean squared error: 0.36
Coefficient of determination: 0.04
In [80]:
# linear regession for Germany(nitrous oxide)
x_gr_n2o = germany.nitrous_oxide.to_numpy().reshape(-1,1)
regr_gr_n2o = linear_model.LinearRegression()
regr_gr_n2o.fit(x_gr_n2o, germany.AverageTemperature)

avgtemp_pred_gr_n2o = regr_gr_n2o.predict(x_gr_n2o)

germany.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(germany.nitrous_oxide, avgtemp_pred_gr_n2o, color='blue', linewidth=3)

plt.show()
In [81]:
# The Intercept
print('Intercept: \n', regr_gr_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_gr_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(germany.AverageTemperature, avgtemp_pred_gr_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(germany.AverageTemperature, avgtemp_pred_gr_n2o))
Intercept: 
 10.375912034939192
Coefficients: 
 [-0.01606251]
Mean squared error: 0.35
Coefficient of determination: 0.07
In [82]:
# linear regession for United Kindom(co2)
x_uk_co2 = uk.co2.to_numpy().reshape(-1,1)
regr_uk_co2 = linear_model.LinearRegression()
regr_uk_co2.fit(x_uk_co2, uk.AverageTemperature)

avgtemp_pred_uk_co2 = regr_uk_co2.predict(x_uk_co2)

uk.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(uk.co2, avgtemp_pred_uk_co2, color='blue', linewidth=3)

plt.show()
In [83]:
# The Intercept
print('Intercept: \n', regr_uk_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_uk_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(uk.AverageTemperature, avgtemp_pred_uk_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(uk.AverageTemperature, avgtemp_pred_uk_co2))
Intercept: 
 10.843796762517986
Coefficients: 
 [-0.00148102]
Mean squared error: 0.24
Coefficient of determination: 0.01
In [84]:
# linear regession for United Kindom(methane)
x_uk_ch4 = uk.methane.to_numpy().reshape(-1,1)
regr_uk_ch4 = linear_model.LinearRegression()
regr_uk_ch4.fit(x_uk_ch4, uk.AverageTemperature)

avgtemp_pred_uk_ch4 = regr_uk_ch4.predict(x_uk_ch4)

uk.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(uk.methane, avgtemp_pred_uk_ch4, color='blue', linewidth=3)

plt.show()
In [85]:
# The Intercept
print('Intercept: \n', regr_uk_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_uk_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(uk.AverageTemperature, avgtemp_pred_uk_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(uk.AverageTemperature, avgtemp_pred_uk_ch4))
Intercept: 
 10.410842468790287
Coefficients: 
 [-0.00376264]
Mean squared error: 0.23
Coefficient of determination: 0.04
In [86]:
# linear regession for United Kindom(nitrous oxide)
x_uk_n2o = uk.nitrous_oxide.to_numpy().reshape(-1,1)
regr_uk_n2o = linear_model.LinearRegression()
regr_uk_n2o.fit(x_uk_n2o, uk.AverageTemperature)

avgtemp_pred_uk_n2o = regr_uk_ch4.predict(x_uk_n2o)

uk.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(uk.nitrous_oxide, avgtemp_pred_uk_n2o, color='blue', linewidth=3)

plt.show()
In [87]:
# The Intercept
print('Intercept: \n', regr_uk_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_uk_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(uk.AverageTemperature, avgtemp_pred_uk_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(uk.AverageTemperature, avgtemp_pred_uk_n2o))
Intercept: 
 10.568173656291883
Coefficients: 
 [-0.01390929]
Mean squared error: 0.29
Coefficient of determination: -0.21
In [88]:
# linear regession for Japan(co2)
x_jp_co2 = japan.co2.to_numpy().reshape(-1,1)
regr_jp_co2 = linear_model.LinearRegression()
regr_jp_co2.fit(x_jp_co2, japan.AverageTemperature)

avgtemp_pred_jp_co2 = regr_jp_co2.predict(x_jp_co2)

japan.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(japan.co2, avgtemp_pred_jp_co2, color='blue', linewidth=3)

plt.show()
In [89]:
# The Intercept
print('Intercept: \n', regr_jp_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_jp_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(japan.AverageTemperature, avgtemp_pred_jp_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(japan.AverageTemperature, avgtemp_pred_jp_co2))
Intercept: 
 14.863437379564203
Coefficients: 
 [-0.0005123]
Mean squared error: 0.17
Coefficient of determination: 0.00
In [90]:
# linear regession for Japan(methane)
x_jp_ch4 = japan.methane.to_numpy().reshape(-1,1)
regr_jp_ch4 = linear_model.LinearRegression()
regr_jp_ch4.fit(x_jp_ch4, japan.AverageTemperature)

avgtemp_pred_jp_ch4 = regr_jp_ch4.predict(x_jp_ch4)

japan.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(japan.methane, avgtemp_pred_jp_ch4, color='blue', linewidth=3)

plt.show()
In [91]:
# The Intercept
print('Intercept: \n', regr_jp_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_jp_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(japan.AverageTemperature, avgtemp_pred_jp_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(japan.AverageTemperature, avgtemp_pred_jp_ch4))
Intercept: 
 14.5745226941419
Coefficients: 
 [-0.01146415]
Mean squared error: 0.16
Coefficient of determination: 0.02
In [92]:
# linear regession for Japan(nitrous oxide)
x_jp_n2o = japan.nitrous_oxide.to_numpy().reshape(-1,1)
regr_jp_n2o = linear_model.LinearRegression()
regr_jp_n2o.fit(x_jp_n2o, japan.AverageTemperature)

avgtemp_pred_jp_n2o = regr_jp_n2o.predict(x_jp_n2o)

japan.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(japan.nitrous_oxide, avgtemp_pred_jp_n2o, color='blue', linewidth=3)

plt.show()
In [93]:
# The Intercept
print('Intercept: \n', regr_jp_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_jp_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(japan.AverageTemperature, avgtemp_pred_jp_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(japan.AverageTemperature, avgtemp_pred_jp_n2o))
Intercept: 
 14.542585721227756
Coefficients: 
 [-0.01277038]
Mean squared error: 0.16
Coefficient of determination: 0.02
In [94]:
# linear regession for India(co2)
x_in_co2 = india.co2.to_numpy().reshape(-1,1)
regr_in_co2 = linear_model.LinearRegression()
regr_in_co2.fit(x_in_co2, india.AverageTemperature)

avgtemp_pred_in_co2 = regr_in_co2.predict(x_in_co2)

india.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(india.co2, avgtemp_pred_in_co2, color='blue', linewidth=3)

plt.show()
In [95]:
# The Intercept
print('Intercept: \n', regr_in_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_in_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(india.AverageTemperature, avgtemp_pred_in_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(india.AverageTemperature, avgtemp_pred_in_co2))
Intercept: 
 25.63055032183447
Coefficients: 
 [0.00036624]
Mean squared error: 0.04
Coefficient of determination: 0.31
In [96]:
# linear regession for India(methane)
x_in_ch4 = india.methane.to_numpy().reshape(-1,1)
regr_in_ch4 = linear_model.LinearRegression()
regr_in_ch4.fit(x_in_ch4, india.AverageTemperature)

avgtemp_pred_in_ch4 = regr_in_ch4.predict(x_in_ch4)

india.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(india.methane, avgtemp_pred_in_ch4, color='blue', linewidth=3)

plt.show()
In [97]:
# The Intercept
print('Intercept: \n', regr_in_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_in_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(india.AverageTemperature, avgtemp_pred_in_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(india.AverageTemperature, avgtemp_pred_in_ch4))
Intercept: 
 23.99385579555225
Coefficients: 
 [0.00343246]
Mean squared error: 0.04
Coefficient of determination: 0.36
In [98]:
# linear regession for India(nitrous oxide)
x_in_n2o = india.nitrous_oxide.to_numpy().reshape(-1,1)
regr_in_n2o = linear_model.LinearRegression()
regr_in_n2o.fit(x_in_n2o, india.AverageTemperature)

avgtemp_pred_in_n2o = regr_in_n2o.predict(x_in_n2o)

india.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(india.nitrous_oxide, avgtemp_pred_in_n2o, color='blue', linewidth=3)

plt.show()
In [99]:
# The Intercept
print('Intercept: \n', regr_in_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_in_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(india.AverageTemperature, avgtemp_pred_in_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(india.AverageTemperature, avgtemp_pred_in_n2o))
Intercept: 
 25.19289698359389
Coefficients: 
 [0.00447097]
Mean squared error: 0.04
Coefficient of determination: 0.33
In [100]:
# linear regession for Indonesia(co2)
x_id_co2 = indonesia.co2.to_numpy().reshape(-1,1)
regr_id_co2 = linear_model.LinearRegression()
regr_id_co2.fit(x_id_co2, indonesia.AverageTemperature)

avgtemp_pred_id_co2 = regr_id_co2.predict(x_id_co2)

indonesia.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(indonesia.co2, avgtemp_pred_id_co2, color='blue', linewidth=3)

plt.show()
In [101]:
# The Intercept
print('Intercept: \n', regr_id_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_id_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(indonesia.AverageTemperature, avgtemp_pred_id_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(indonesia.AverageTemperature, avgtemp_pred_id_co2))
Intercept: 
 26.53017558299921
Coefficients: 
 [0.00047174]
Mean squared error: 0.03
Coefficient of determination: 0.08
In [102]:
# linear regession for Indonesia(methane)
x_id_ch4 = indonesia.methane.to_numpy().reshape(-1,1)
regr_id_ch4 = linear_model.LinearRegression()
regr_id_ch4.fit(x_id_ch4, indonesia.AverageTemperature)

avgtemp_pred_id_ch4 = regr_id_ch4.predict(x_id_ch4)

indonesia.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(indonesia.methane, avgtemp_pred_id_ch4, color='blue', linewidth=3)

plt.show()
In [103]:
# The Intercept
print('Intercept: \n', regr_id_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_id_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(indonesia.AverageTemperature, avgtemp_pred_id_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(indonesia.AverageTemperature, avgtemp_pred_id_ch4))
Intercept: 
 26.505283771881405
Coefficients: 
 [0.00046615]
Mean squared error: 0.03
Coefficient of determination: 0.03
In [104]:
# linear regession for Indonesia(nitrous oxide)
x_id_n2o = indonesia.nitrous_oxide.to_numpy().reshape(-1,1)
regr_id_n2o = linear_model.LinearRegression()
regr_id_n2o.fit(x_id_n2o, indonesia.AverageTemperature)

avgtemp_pred_id_n2o = regr_id_n2o.predict(x_id_n2o)

indonesia.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(indonesia.nitrous_oxide, avgtemp_pred_id_n2o, color='blue', linewidth=3)

plt.show()
In [105]:
# The Intercept
print('Intercept: \n', regr_id_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_id_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(indonesia.AverageTemperature, avgtemp_pred_id_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(indonesia.AverageTemperature, avgtemp_pred_id_n2o))
Intercept: 
 26.083515063351648
Coefficients: 
 [0.00745369]
Mean squared error: 0.02
Coefficient of determination: 0.23
In [106]:
# linear regession for Brazil(co2)
x_bz_co2 = brazil.co2.to_numpy().reshape(-1,1)
regr_bz_co2 = linear_model.LinearRegression()
regr_bz_co2.fit(x_bz_co2, brazil.AverageTemperature)

avgtemp_pred_bz_co2 = regr_bz_co2.predict(x_bz_co2)

brazil.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(brazil.co2, avgtemp_pred_bz_co2, color='blue', linewidth=3)

plt.show()
In [107]:
# The Intercept
print('Intercept: \n', regr_bz_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_bz_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(brazil.AverageTemperature, avgtemp_pred_bz_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(brazil.AverageTemperature, avgtemp_pred_bz_co2))
Intercept: 
 22.29482275634087
Coefficients: 
 [0.00164796]
Mean squared error: 0.06
Coefficient of determination: 0.18
In [108]:
# linear regession for Brazil(methane)
x_bz_ch4 = brazil.methane.to_numpy().reshape(-1,1)
regr_bz_ch4 = linear_model.LinearRegression()
regr_bz_ch4.fit(x_bz_ch4, brazil.AverageTemperature)

avgtemp_pred_bz_ch4 = regr_bz_ch4.predict(x_bz_ch4)

brazil.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(brazil.methane, avgtemp_pred_bz_ch4, color='blue', linewidth=3)

plt.show()
In [109]:
# The Intercept
print('Intercept: \n', regr_bz_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_bz_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(brazil.AverageTemperature, avgtemp_pred_bz_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(brazil.AverageTemperature, avgtemp_pred_bz_ch4))
Intercept: 
 21.848086156425126
Coefficients: 
 [0.00263547]
Mean squared error: 0.05
Coefficient of determination: 0.21
In [110]:
# linear regession for Brazil(nitrous oxide)
x_bz_n2o = brazil.nitrous_oxide.to_numpy().reshape(-1,1)
regr_bz_n2o = linear_model.LinearRegression()
regr_bz_n2o.fit(x_bz_n2o, brazil.AverageTemperature)

avgtemp_pred_bz_n2o = regr_bz_n2o.predict(x_bz_n2o)

brazil.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(brazil.nitrous_oxide, avgtemp_pred_bz_n2o, color='blue', linewidth=3)

plt.show()
In [111]:
# The Intercept
print('Intercept: \n', regr_bz_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_bz_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(brazil.AverageTemperature, avgtemp_pred_bz_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(brazil.AverageTemperature, avgtemp_pred_bz_n2o))
Intercept: 
 22.014660266194007
Coefficients: 
 [0.00564099]
Mean squared error: 0.05
Coefficient of determination: 0.21

After looking at all the plot, we can see that almost all of our target countries have a fairly weak linear relationship. The r-sqaure for china's co2 is 0.13 with a very small positive coefficient, methane is 0.13 with a small negative coefficient, and nitrous oxide is 0.12 with a small positive coefficient. For United States, the r-square for co2 is 0.1 with small positive coefficient, methane is 0.08 with small negative coefficient, and nitrous oxide is 0.09 with small negative coefficient. For Russia, they have a relatively higher r-sqaures, the r-square for co2 is 0.31, with a small negative coeffient, methane is 0.29 with a small positive coefficient, and nitrous oxide is 0.31 with small negative coefficient. For Germany, they also have a relatively higher r-sqaures. The r-square for co2 is 0.36 with small negative coefficient, methane is 0.36 with small negative coefficient, and nitrous oxide is 0.35 with small negative coefficient. For United Kingdom, they also have a relatively higher r-sqaures. The r-square for co2 is 0.24 with small negative coefficient, methane is 0.23 with small negative coefficient, and nitrous oxide is 0.29 with small negative coefficient. For Japan, the r-square for co2 is 0.17 with small negative coefficient, methane is 0.16 with small negative coefficient, and nitrous oxide is 0.16 with small negative coefficient. For India, the r-square for co2 is 0.04 with small positive coefficient, methane is 0.04 with small positive coefficient, and nitrous oxide is 0.04 with small positive coefficient. For Indonesia, the r-square for co2 is 0.03 with small positive coefficient, methane is 0.03 with small positive coefficient, and nitrous oxide is 0.02 with small positive coefficient. For Brazil, the r-square for co2 is 0.06 with small positive coefficient, methane is 0.05 with small positive coefficient, and nitrous oxide is 0.05 with small positive coefficient.

We can see that many of the target countries has a negative coefficient; however, the r-square we get are pretty small. In this case, this might prove that our hypothesis might be rejected, but we still have not obtain a r-sqaure value from all the country, so in the next part, we will try to get the r-square value for all the countries. We will display them through a choropleth world map and by then we might be able to conclude whether if our hypothesis stands, whether annual green house gases and annual average temperature by country has any relationships.

In [112]:
# obtain the r-square and coefficient values for all the countries(co2) in our dataset
countries = np.unique(Combined_df['Country'])
coef_co2 = {}
rsq_co2 = {}
for country in countries:
    con = Combined_df[Combined_df['Country'] == country]
    x_all_co2 = con.co2.to_numpy().reshape(-1,1)
    regr_all_co2 = linear_model.LinearRegression()
    regr_all_co2.fit(x_all_co2, con.AverageTemperature)
    avgtemp_pred_all_co2 = regr_all_co2.predict(x_all_co2)
    coef_co2[country] = regr_all_co2.coef_[0]
    rsq_co2[country] = r2_score(con.AverageTemperature, avgtemp_pred_all_co2)
In [113]:
Combined_df['coef_co2'] = Combined_df['Country'].apply(lambda x: coef_co2[x])
Combined_df['rsq_co2'] = Combined_df['Country'].apply(lambda x: rsq_co2[x])

Combined_df.head(10)
Out[113]:
Year CountryCode Country AverageTemperature co2 methane nitrous_oxide coef_co2 rsq_co2
0 1990 AFG Afghanistan 14.913135 2.602 8.97 3.25 0.006100 0.001788
1 1990 ALB Albania 16.371083 5.511 3.67 1.59 0.090572 0.060596
2 1990 DZA Algeria 18.818533 76.737 22.14 5.00 0.011036 0.112170
3 1990 AGO Angola 22.333861 5.089 39.07 18.43 -0.005921 0.032005
4 1990 ARG Argentina 17.410458 111.890 114.00 37.97 0.002525 0.043044
5 1990 ARM Armenia 9.097833 18.116 3.59 0.87 -0.022160 0.013987
6 1990 AUS Australia 17.351583 278.424 162.19 82.27 0.003767 0.449408
7 1990 AUT Austria 7.460117 62.323 11.57 4.90 -0.003891 0.001363
8 1990 AZE Azerbaijan 12.014083 51.691 16.39 2.84 -0.058197 0.416205
9 1990 BHS Bahamas 25.742917 1.839 0.24 0.07 -0.425460 0.076663
In [115]:
# choropleth world map for all countries' co2 emissions r-square values
fig = px.choropleth(data_frame = Combined_df,
                    locations= "CountryCode",
                    color= "rsq_co2",  
                    hover_name= "Country",
                    color_continuous_scale= 'YlOrRd')

fig.show()
In [116]:
# choropleth world map for all countries' co2 emissions coefficient values
fig = px.choropleth(data_frame = Combined_df,
                    locations= "CountryCode",
                    color= "coef_co2",  
                    hover_name= "Country",
                    color_continuous_scale= 'YlOrRd')

fig.show()
In [117]:
# obtain the r-square and coefficient values for all the countries(methane) in our dataset
countries = np.unique(Combined_df['Country'])
coef_ch4 = {}
rsq_ch4 = {}
for country in countries:
    con = Combined_df[Combined_df['Country'] == country]
    x_all_ch4 = con.methane.to_numpy().reshape(-1,1)
    regr_all_ch4 = linear_model.LinearRegression()
    regr_all_ch4.fit(x_all_ch4, con.AverageTemperature)
    avgtemp_pred_all_ch4 = regr_all_ch4.predict(x_all_ch4)
    coef_ch4[country] = regr_all_ch4.coef_[0]
    rsq_ch4[country] = r2_score(con.AverageTemperature, avgtemp_pred_all_ch4)
In [118]:
Combined_df['coef_ch4'] = Combined_df['Country'].apply(lambda x: coef_ch4[x])
Combined_df['rsq_ch4'] = Combined_df['Country'].apply(lambda x: rsq_ch4[x])

Combined_df.head(10)
Out[118]:
Year CountryCode Country AverageTemperature co2 methane nitrous_oxide coef_co2 rsq_co2 coef_ch4 rsq_ch4
0 1990 AFG Afghanistan 14.913135 2.602 8.97 3.25 0.006100 0.001788 0.002795 0.003859
1 1990 ALB Albania 16.371083 5.511 3.67 1.59 0.090572 0.060596 -0.437345 0.080718
2 1990 DZA Algeria 18.818533 76.737 22.14 5.00 0.011036 0.112170 0.027323 0.358788
3 1990 AGO Angola 22.333861 5.089 39.07 18.43 -0.005921 0.032005 0.003363 0.004523
4 1990 ARG Argentina 17.410458 111.890 114.00 37.97 0.002525 0.043044 0.009172 0.052720
5 1990 ARM Armenia 9.097833 18.116 3.59 0.87 -0.022160 0.013987 -0.861487 0.268389
6 1990 AUS Australia 17.351583 278.424 162.19 82.27 0.003767 0.449408 -0.003313 0.039817
7 1990 AUT Austria 7.460117 62.323 11.57 4.90 -0.003891 0.001363 -0.114486 0.068813
8 1990 AZE Azerbaijan 12.014083 51.691 16.39 2.84 -0.058197 0.416205 0.024895 0.194911
9 1990 BHS Bahamas 25.742917 1.839 0.24 0.07 -0.425460 0.076663 0.967196 0.017537
In [119]:
# choropleth world map for all countries' methane emissions r-square values
fig = px.choropleth(data_frame = Combined_df,
                    locations= "CountryCode",
                    color= "rsq_ch4",  
                    hover_name= "Country",
                    color_continuous_scale= 'YlOrRd')

fig.show()
In [120]:
# choropleth world map for all countries' methane emissions coefficient values
fig = px.choropleth(data_frame = Combined_df,
                    locations= "CountryCode",
                    color= "coef_ch4",  
                    hover_name= "Country",
                    color_continuous_scale= 'YlOrRd')

fig.show()
In [121]:
# obtain the r-square and coefficient values for all the countries(nitrous oxide) in our dataset
countries = np.unique(Combined_df['Country'])
coef_n2o = {}
rsq_n2o = {}
for country in countries:
    con = Combined_df[Combined_df['Country'] == country]
    x_all_n2o = con.nitrous_oxide.to_numpy().reshape(-1,1)
    regr_all_n2o = linear_model.LinearRegression()
    regr_all_n2o.fit(x_all_n2o, con.AverageTemperature)
    avgtemp_pred_all_n2o = regr_all_n2o.predict(x_all_n2o)
    coef_n2o[country] = regr_all_n2o.coef_[0]
    rsq_n2o[country] = r2_score(con.AverageTemperature, avgtemp_pred_all_n2o)
In [122]:
Combined_df['coef_n2o'] = Combined_df['Country'].apply(lambda x: coef_n2o[x])
Combined_df['rsq_n2o'] = Combined_df['Country'].apply(lambda x: rsq_n2o[x])

Combined_df.head(10)
Out[122]:
Year CountryCode Country AverageTemperature co2 methane nitrous_oxide coef_co2 rsq_co2 coef_ch4 rsq_ch4 coef_n2o rsq_n2o
0 1990 AFG Afghanistan 14.913135 2.602 8.97 3.25 0.006100 0.001788 0.002795 0.003859 0.070256 0.028072
1 1990 ALB Albania 16.371083 5.511 3.67 1.59 0.090572 0.060596 -0.437345 0.080718 0.088130 0.000469
2 1990 DZA Algeria 18.818533 76.737 22.14 5.00 0.011036 0.112170 0.027323 0.358788 0.124579 0.241261
3 1990 AGO Angola 22.333861 5.089 39.07 18.43 -0.005921 0.032005 0.003363 0.004523 0.068405 0.082948
4 1990 ARG Argentina 17.410458 111.890 114.00 37.97 0.002525 0.043044 0.009172 0.052720 0.015705 0.038514
5 1990 ARM Armenia 9.097833 18.116 3.59 0.87 -0.022160 0.013987 -0.861487 0.268389 0.061851 0.000118
6 1990 AUS Australia 17.351583 278.424 162.19 82.27 0.003767 0.449408 -0.003313 0.039817 -0.000022 0.000002
7 1990 AUT Austria 7.460117 62.323 11.57 4.90 -0.003891 0.001363 -0.114486 0.068813 -0.123713 0.014271
8 1990 AZE Azerbaijan 12.014083 51.691 16.39 2.84 -0.058197 0.416205 0.024895 0.194911 0.646384 0.138101
9 1990 BHS Bahamas 25.742917 1.839 0.24 0.07 -0.425460 0.076663 0.967196 0.017537 3.196322 0.029030
In [123]:
# choropleth world map for all countries' nitrous oxide emissions r-square values
fig = px.choropleth(data_frame = Combined_df,
                    locations= "CountryCode",
                    color= "rsq_n2o",  
                    hover_name= "Country",
                    color_continuous_scale= 'YlOrRd')

fig.show()
In [124]:
# choropleth world map for all countries' nitrous oxide emissions coefficient values
fig = px.choropleth(data_frame = Combined_df,
                    locations= "CountryCode",
                    color= "coef_n2o",  
                    hover_name= "Country",
                    color_continuous_scale= 'YlOrRd')

fig.show()

After look at the choropleth world map, where we looked at all the r-square values and coefficient of all the countries in our data set; We found very similar results to what we saw in our target countries. The r-sqaure values for all the countries are very low, some of the relatively high r-sqaure is still peaked at around 0.5-0.6, which is still fairly low. Also, the coefficient are generally very low. Now we have gained enough confident from our data analysis to draw a conclusions in the next part.

Insights learned from tutorial

This is the final stage of the cycle, where we draw conclusion based on everything we did above with our data.

We will reject our hypothesis, that there are relationship between annual green house gases emissions and annual average temperature by countries. As we can see in the last stage, most of the r-square values are very low, which resemble that there are not much of a linear relationship.

Although we rejected our hypothesis, but does this mean there are no relationship between green house gases emissions and temperature? Alsolutely not. There are always many approaches to one single problem, there are many ways to examine two features. This is what makes data science and machine learning challenging and exciting, there are always more to learn from what has already established.